Data - intensive file systems for Internet services : A rose

نویسندگان

Wittawat Tantisiriroj

Swapnil Patil

Garth Gibson

چکیده

Data-intensive distributed file systems are emerging as a key component of large scale Internet services and cloud computing platforms. They are designed from the ground up and are tuned for specific application workloads. Leading examples, such as the Google File System, Hadoop distributed file system (HDFS) and Amazon S3, are defining this new purpose-built paradigm. It is tempting to classify file systems for large clusters into two disjoint categories, those for Internet services and those for high performance computing. In this paper we compare and contrast parallel file systems, developed for high performance computing, and data-intensive distributed file systems, developed for Internet services. Using PVFS as a representative for parallel file systems and HDFS as a representative for Internet services file systems, we configure a parallel file system into a data-intensive Internet services stack, Hadoop, and test performance with microbenchmarks and macrobenchmarks running on a 4,000 core Internet services cluster, Yahoo!’s M45. Once a number of configuration issues such as stripe unit sizes and application buffering sizes are dealt with, issues of replication, data layout and data-guided function shipping are found to be different, but supportable in parallel file systems. Performance of Hadoop applications storing data in an appropriately configured PVFS are comparable to those using a purpose built HDFS.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data - intensive file systems for Internet services : A rose by any other

متن کامل

Data-intensive File Systems for Internet Services: A Rose by Any Other Name... (CMU-PDL-08-114)

متن کامل

DiskReduce: RAID for Data-Intensive Scalable Computing (CMU-PDL-09-112)

Data-intensive file systems, developed for Internet services and popular in cloud computing, provide high reliability and availability by replicating data, typically three copies of everything. Alternatively high performance computing, which has comparable scale, and smaller scale enterprise storage systems get similar tolerance for multiple failures from lower overhead erasure encoding, or RAI...

متن کامل

The xDotGrid native, cross-platform, high-performance xDFS file transfer framework

In this paper we introduce and describe the highly concurrent xDFS file transfer protocol and examine its cross-platform and cross-language implementation in native code for both Linux and Windows in 32 or 64-bit multi-core processor architectures. The implemented xDFS protocol based on xDotGrid.NET framework is fully compared with the Globus GridFTP protocol. We finally propose the xDFS protoc...

متن کامل

A Metadata Workload Generator for Data-Intensive File Systems

Large-scale data-intensive computing [2, 3] has posed numerous challenges to the underlying distributed file system, due to the unprecedented amount of data, the large number of users, the intense competition on cost and service quality, and the emergence of new applications. As a result, there has been an increasing amount of research on scalable metadata management [4, 6], high availability [...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Data - intensive file systems for Internet services : A rose

نویسندگان

چکیده

منابع مشابه

Data - intensive file systems for Internet services : A rose by any other

Data-intensive File Systems for Internet Services: A Rose by Any Other Name... (CMU-PDL-08-114)

DiskReduce: RAID for Data-Intensive Scalable Computing (CMU-PDL-09-112)

The xDotGrid native, cross-platform, high-performance xDFS file transfer framework

A Metadata Workload Generator for Data-Intensive File Systems

عنوان ژورنال:

اشتراک گذاری